GitHub - raphaelsty/knowledge: Online library for AI

An opinionated reading list of 450+ people who shape AI, science, and software today. A place to learn.

No ads, no algorithm, just what they read.

Knowledge is a personal library. A place to keep what you found interesting on GitHub, on X, in a blog, in a paper, and to read it when you have time. A talk that mattered last week is still worth watching this week. A paper from two months ago still teaches you what you came for. Build your own library, or sit in someone else's for an hour. Spend that hour with Andrej Karpathy's bookmarks and you learn what fifteen years of ML looks like.

Tap the heart on any card and the doc lands in your library, indexed and searchable. Every contributor has a personal page that reads like a curated bookshelf: their tweets, their stars, the papers they wrote, the videos they show up in. Browse it the way you'd browse a friend's bookmarks folder.

Libraries to visit

A few rooms worth walking into.

…or wander through all 450+ libraries.

Search

Type a query. ColBERT searches the actual contents of every doc, not just titles, and ranks them by how well the words match. Search one library, several at once, or the whole shared corpus.

MCP

The API exposes an MCP server at /mcp with fifteen tools. Twelve are public, three require a bearer token you mint at /profile.

Search & discover

search: query a single library
search_across: query several libraries at once
search_personalities: find libraries by description
find_similar: docs related to one you've read
latest: most recent docs in a library
feed: chronological cross-library feed
intersect_documents: docs shared between libraries

Catalog

list_personalities: every library
list_sources: sources for a library
list_tags: tags for a library
get_personality: one library's metadata
get_document: one doc by URL

Authenticated

my_library: your saved docs
my_timeline: your activity feed
save_document: save a doc to your library

claude mcp add knowledge --transport http https://knowledge-web.org/mcp \
  --header "Authorization: Bearer kn_..."

How it works

The pipeline runs all day, walking through each personality's sources in a continuous loop: GitHub stars, X posts, Hacker News submissions, arXiv papers, Hugging Face likes, Reddit, Stack Overflow, Wikipedia, the rest of it. Each document gets cleaned, tagged, written to Postgres. A separate indexer daemon picks up new rows and embeds them with ColBERT, so search stays current without blocking the main pipeline. When you type a query, the API serves ranked results from a next-plaid PLAID index sitting on local disk. Your browser does a second pass with an unquantized ColBERT running in WASM to re-rank what landed. Soup to nuts the whole stack lives in this repo: sources/ is Python (fetchers and orchestrator), api/ is Rust (search, ingest, auth, MCP), web/ is plain HTML and JS.

Why it helps

Most platforms compete for your attention with infinite feeds, ads between every post, notifications you didn't ask for, recommendations from an algorithm that learned to manipulate you. Knowledge does the opposite: small, finite libraries you can return to. Use it to research a topic across experts. Search 450+ libraries at once for "speculative decoding" and you get curated context instead of random Google noise. Browse Karpathy's GitHub stars, Yann LeCun's papers, Geoffrey Hinton's interviews, all in one place. Stop doomscrolling X. The site compresses someone's year of tweets into a static page you can read once and close. Sign in to save what matters, search your own library, mint a token to wire the MCP server into Claude, Cursor, or any agent that speaks MCP.

Under the hood

Knowledge has always been a showcase for the information retrieval tools I'm building. It started four years ago on a cherche backend and now runs on next-plaid and pylate-rs, the same search stack behind ColGREP, the semantic code-search tool. The API is a single Rust binary, the pipeline is Python, the frontend is plain HTML and JS. Everything runs on a single Hetzner VPS.

The pipeline parses about a dozen sources: GitHub stars, X posts and likes, Hacker News submissions and comments, arXiv, Google Scholar, DBLP, Hugging Face likes, YouTube channels, Zotero libraries, Reddit, Stack Overflow, Wikipedia references, plus any blog you can point at via RSS or sitemap. As of today that's 450+ personal libraries, around 440,000 documents indexed.

So yes, when you type a query a quantized ColBERT runs on the server's CPU against a next-plaid index, and then on your phone an unquantized ColBERT in WASM re-ranks the results. The browser-side full-precision re-rank is, as far as I know, an original trick.

Cost and hosting

Free to use, free to read. The whole site runs on a single Hetzner CX33 in Helsinki: 4 vCPUs, 8 GB RAM, around $15 a month all in. No CDN, no managed Postgres, no Cloudflare proxy in front of the app. The 3.8 GB ColBERT index sits on local disk and the API serves it directly. To self-host you clone the repo, set five env vars, point a domain at the box, push to main. PolyForm Noncommercial 1.0.0 covers personal and educational use.

License

PolyForm Noncommercial 1.0.0. Free to use, modify, and self-host for non-commercial purposes. Get in touch for anything else.

Citation

@software{sourty2026knowledge,
  author  = {Sourty, Raphaël},
  title   = {Knowledge: a library for the internet},
  year    = {2026},
  url     = {https://github.com/raphaelsty/knowledge},
  license = {PolyForm-Noncommercial-1.0.0}
}

Name		Name	Last commit message	Last commit date
Latest commit History 569 Commits
.github		.github
api		api
clients		clients
data		data
scripts		scripts
sources		sources
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Caddyfile.dokploy		Caddyfile.dokploy
Dockerfile		Dockerfile
Dockerfile.caddy		Dockerfile.caddy
Dockerfile.daemons		Dockerfile.daemons
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.dokploy.yml		docker-compose.dokploy.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
pyproject.toml		pyproject.toml
report.md		report.md
run.py		run.py
sources.yml		sources.yml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Libraries to visit

Search

MCP

How it works

Why it helps

Under the hood

Cost and hosting

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Libraries to visit

Search

MCP

How it works

Why it helps

Under the hood

Cost and hosting

License

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages